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We present a method to compute the Fermi function of the Hamiltonian for a system of inde- 
pendent fermions, based on an exact decomposition of the grand-canonical potential. This scheme 
does not rely on the localization of the orbitals and is insensitive to ill-conditioned Hamiltonians. 
It lends itself naturally to linear scaling, as soon as the sparsity of the system's density matrix is 
exploited. By using a combination of polynomial expansion and Newton-like iterative techniques, 
an arbitrarily large number of terms can be employed in the expansion, overcoming some of the 
difficulties encountered in previous papers. Moreover, this hybrid approach allows us to obtain a 
very favorable scaling of the computational cost with increasing inverse temperature, which makes 
the method competitive with other Fermi operator expansion techniques. After performing an in- 
depth theoretical analysis of computational cost and accuracy, we test our approach on the DFT 
Hamiltonian for the metallic phase of the LiAl alloy. 

PACS numbers: 71.15.-m, 31.15.-p 



I. INTRODUCTION 

When calculating the various ground state properties 
of fermionic systems, it is important to have fast and 
accurate ways of evaluating the density matrix. For 
non-interacting fermions, this amounts to calculating the 
Fermi function associated with the system's Hamiltonian 
H. In many applications, H is either of empirical nature, 
or the result of a self-consistent density functional the- 
ory (DFT) calculation. The standard method for com- 
puting the density matrix requires diagonalizing H, an 
operation whose computational complexity scales cubi- 
cally with the number of electronic degrees of freedom N. 
Having a linear scaling scheme to obtain this quantity is 
a key step for modeling larger systems, thus making pos- 
sible the computational study of a vast class of problems, 
whose behavior cannot be described by smaller models. 
Areas in which such a technique would have a major im- 
pact include nanotcchnology and biochemistry, to name 
but a couple. 

Several methods have been proposed to circumvent 
diagonalizationi. These methods are based on the near- 
sightedness principle^, which guarantees that in the 
A*" — > oo limit the matrices needed to compute the Fermi 
operator will become sparse. Among the different ap- 
proaches that have been proposed, we might cite dividc- 
and-conquer schemes^, density-matrix minimization , 
Green's function^, maximally localized orbitals^ and 
penalty functions methods 2 .. The use of sparse matrix 
algebra eventually leads to linear scaling, both in terms 
of memory requirements and of computational cost. A 
second class of methods, on which we shall focus here, 
uses the finite-temperature Fermi operator. Due to the 
finite temperature, the singularity at the chemical po- 
tential [i is smoothed, thus allowing for an expansion in 
simpler functions of H. Since orbital localization is not 
explicitly exploited, this class of methods can also be ap- 
plied to metals. The earliest attempts in this direction 
were based on an expansion in Chcbyshev polynomials^. 



The computational cost of this method has been ana- 
lyzed by Baer and Head-Gordon^, who found that the 
order m of the polynomial needed to achieve a 1Q~ D ac- 
curacy depends linearly on the width of the Hamilto- 
nian spectrum AE and the electronic temperature 1 / /3, 
i.e. m ~ D/3AE. This obviously raises some problems 
when considering Hamiltonians with large AE, such as 
those arising from DFT calculations using plane wave ba- 
sis sets, or when low temperatures are required. Recently 
it has been suggested^ that fast polynomial summation 
methods, requiring a number of multiplications ~ \/m, 
can be applied to Fermi operator expansion, leading to 
the more favorable scaling \f(3AE. 

In this paper we revisit a particular form for the ex- 
pansion of the Fermi operator, which is based on the 
grand-canonical formalism and developed in a series of 
recent paper a 12 ' 13 i 14 i 15 . The grand-canonical potential 
for independent fermions is split into a sum of P terms, 
containing e~^ H ~ Al )/ 2P . As a consequence of this de- 
composition, the Fermi operator can be written exactly 
as a sum of P terms. The larger the number of terms, 
the easier the evaluation of the exponential: this implies 
a tradeoff between the size of P and the accuracy of the 
results. In this paper, we investigate the analytical prop- 
erties of this decomposition, finding that a large number 
of terms are almost ideally conditioned, and that their 
contribution to the Fermi operator can be easily and ef- 
fectively computed in a single shot with a polynomial 
expansion. The remaining few are tackled via a Newton- 
like iterative inversion scheme, which needs to be applied 
to each term individually but is very efficient in dealing 
with large D(3AE. With this hybrid approach, large val- 
ues of P can be reached at a cost that is modest and 
independent of the system size. This result can improve 
significantly the prefactor of other methods using sim- 
ilar decomposition :] 12 ' 13 ! 14 ! 15 . Moreover, using this ap- 
proach, we achieve a scaling of the operations count with 
D[3AE that is sublinear, and competitive with the result 
of RcfAi if their fast summation technique is used. In 
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this way, accurate, low-tcmpcrature calculations can be 
performed. 



II. PROPERTIES OF THE EXPANSION 

We use an expansion of the Fermi operator based on 
grand-canonical formalism, which has been developed 
and employed in several recent work a 12 ' 13 i 14 ' 15 . We sum- 
marize the derivation and the resulting expression here, 
introducing a slightly different notation. To simplify 
the expressions, we will set the zero of energy at fi, 
and measure energies in units of fc^T. This amounts 
to replacing in the standard expression for the Fermi 
operator (3 (H — /il) with H. Using this notation, the 
grand-canonical potential for a system of non-interacting 
fermions become o 16 ' 17 
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These expressions are analogous to those introduced in 
Refi^, apart from a change of indices (P/2 — > P, P + 
1/2- I -»•?). 

Using factorization ([3]), the grand-canonical po- 
tential can be written in compact form as f2 = 

-2Tr^£ =1 ln ( M ' M *)- TnG observables of interest for 
the system can be obtained as derivatives of the grand- 
canonical potential. In particular, the grand-canonical 
density matrix reads: 



sn 



1 2 P 

s- = - Vl-TrReMr 1 . (4) 

1 + e H P ^ 1 K ' 

i=i 



The decomposition (0| is exact for any value of P. As 
P increases, the exponential e~ H / 2P is easier to approx- 
imate. However, the number of M;S which have to be 
inverted increases. Previous works using this approach 
had to find the best compromise between the length of 
the expansion and the errors introduced by an approxi- 
mate evaluation of the matrix exponential, therefore los- 
ing the advantage of an exact expansion. In order to 
find a solution to this problem, it is useful to analyze 
the properties of the M/s in the large P limit. It turns 
out that matrices with small I are much more difficult to 
handle than those having a higher index. We therefore 
suggest applying different strategies in the two cases. 



\J\ + e~ x l p — 2e~ x / 2P cos Tcl/P, which is equal to Mi (x) 
(equation (J5J) within O (P^ 1 ). The dashed line corresponds 
to the locus of local minima. 
1.0 
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A. Properties of M; matrices 

Let us define the spectral radius of a matrix A as the 
maximum modulus of its eigenvalues, a (A) = maxi \<n\, 
and its condition number n (A) = a (A) /cr(A _1 ). We 
then introduce the shorthands Ae = <r(H), which is a 
measure of the width of the Hamiltonian's spectrum, and 
Se = 1/cr (H _1 ), which is of the order of the band gap 
in insulators, and tends to zero for metals. With this 
notation, the condition number of the Hamiltonian is 
K (H) = Ae/Se. In this section, we will obtain the cor- 
responding quantities for the M/s. In particular, we will 
show that k(M/) does not depend on P in the large P 
limit, and demonstrate that the M/s are always better 
conditioned than the Hamiltonian. 

We must consider how the spectrum of the Hamilto- 
nian is mapped by the function 



Mi (x)= 1 - e -(2'-D/2P e -z/2P 



(5) 



It is readily found that, for any P and x, Mi (x) is 
a monotonically decreasing function of I. For fixed 
I, Mi (\x\) < Mi (— | a; |) and the minimum value is 
Mi(x m in) = sin7r (21 — 1) /2P, which is reached for 
Xmin = — 2Plncos7r (21 — 1) /2P. From the plot of 
Mi (x) (Figure [T]) , it is apparent that the region which 
can lead to ill-conditioned matrices is the one with I <C P 
and x€?, where the spectrum of M; can contain eigen- 
values close to zero. In this region, an upper bound to 
the maximum eigenvalue is given by Mi (— Ae), and an 
estimate of the minimum eigenvalue within 0(1/ P) is 
Mi(Se). 

The following set of results can easily be proved by 
series expansion in powers of 1/P, assuming I <C P and 
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A. Series expansion for the tail 



a (Mi) 
a (Mr 1 ) 
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Ae 



(8) 



It can be seen from eq. ([5]) that the condition number 
K (M;) tends rapidly to one as I is increased, and is always 
smaller than k (H) (see also Figure [2]). Note that the last 
inequality in eq. ((SJ), valid for Se — > 0, shows that k (M;) 
is bounded also in the metallic case. 



FIG. 2: (color online) Condition number of M;, in the P — > oo 
limit, for a typical value of Ae. Dark (blue) and light (red) 
series correspond to the behavior for a metal and for an in- 
sulator (Se — 20). Even for a metallic system the condition 
number remains finite, and for the insulator it saturates at 
k (H). In both (M;) drops rapidly to one as I in- 

creases. 
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III. 



A HYBRID APPROACH TO THE 
EXPANSION 



The analysis performed above suggests dealing sepa- 
rately with the few, worst-conditioned M; matrices hav- 
ing I < I, and with those which have k(M() ~ 1, for 
I > I. The latter will form the "tail" contribution to the 
density matrix, and will be discussed first. 



In order to obtain a convergent power series for M ; _1 , it 
is convenient to perform an expansion around the diago- 
nal matrix z'l, where z' is an arbitrary complex number 
whose value will be chosen so as to accelerate conver- 
gence. Defining the shorthand Z; = e 17r ( 2i - 1 )/ 2P e - H / 2P ) 
one has 



Mr 1 



(i 



J=0 



z'l 



1 - z' 

Z; - z'l 
1-z' 



(9) 



The condition for convergence of is that the whole 
spectrum of (Z/ — z'l) / (1 — z') lies within the unit circle 
in the complex plane. Moreover, the convergence speed of 
the expansion will be determined by the eigenvalue which 
lies farthest from the origin (see Figure [3]). We refer to 
appendix [Al for a detailed analysis of the convergence 
ratio 



X 



Z, 



ke^ 1 1 



1 - ke^' 



(10) 



where we have set z' = fee 1 * 1 , defining <f>i = 
7T (21 — 1) /2P, and introducing the and complex-valued 
parameter k. There we show that, in the large P limit, 
one obtains an upper bound to the convergence ratio, i.e. 

X = Ae/ J Ae 2 + n 2 (21 — l) 2 < 1, provided one chooses 
for the optimal k the analytical estimate 



Ae 2 



2Ptt (21-1) 



(11) 



Having ensured that the series ([9]) converges, we can 
estimate the error made by truncating the power series 
after itit terms, 



*(<-')- g(^) -Mr- 

1 / Z t -Z' l\ j 1 

II - z'l . ^ ° { 1-z' 



'1 \ 1 



j=m T +l 



\l-z'\ 1-x 



(12) 



In order to achieve a 10 D relative accuracy on M ; , it 
is necessary to retain at least 



itit 



1 

lnx 



ln( i - 1 ) -DlnlO + Lnfll - z'\ a (M^ 1 )) 



terms. If we use eq. (jlll) and eq. (JTJl, setting Se = 0, and 
taking the large Ae limit, this estimate takes the simpler 
form 



m T « 2D 



it 2 (21 - 1) 



In 10 



(13) 
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FIG. 3: (color online) The picture sketches the transforma- 
tions in the complex plane leading from the Hamiltonian 
spectrum (1) to Z; (2), to Z; — z'l (3), and eventually to 
(Zj - z'l) / (1 - z') (4). The translation (2)->(3) and the 
scaling (3)— » (4) depend both on the choice of z' . As described 
in the text, it is always possible to choose the parameter so 
as to keep the whole spectrum within the unit circle, ensuring 
convergence of the power series ([9j . 

Im 




While the scaling with Ae 2 is not optimal, the depen- 
dence on l~ 2 limits its effects to the small-/ terms. These 
terms can be dealt with effectively with a different ap- 
proach, as we will show below. The influence of the Ae 2 
scaling on the overall operations count will therefore be 
limited. 

Thanks to the chosen z' parametrization the matrix 
powers entering eq. @ depend on I only by a scalar fac- 
tor, 

1-z' 1 - ke 1 ^ V J 



FIG. 4: (color online) The number of terms required to 
achieve 10 -3 relative accuracy in the polynomial expansion 
of M^ 1 is plotted for a Hamiltonian with minimum eigen- 
value —5, maximum eig envalue 10, and for P = 10 4 . A full 
line corresponds to results computed keeping k fixed to the 
I — 1 value, while dots correspond to the results computed by 
optimizing k separately for each value of /. Dark (blue) and 
light (red) series correspond respectively to the results based 
on the analytical estimate (|lip for k, and to the ones obtained 
by iteratively minimizing (|A1|I . Iterative refinement leads to 
a significant boost in performance. In any case, the number 
of terms computed for I = 1 largely exceeds the terms needed 
to compute the contributions for larger I values, even if k is 
not optimized on a case-by-case basis. 
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of tot on / offsets the effect of using a non-optimal k. It 
is easy to show, given the estimate (|11[) , that the number 
of terms computed for I largely exceeds the number of 
terms required to compute Mj -1 for any I > f, even if k 
is kept fixed to the valued optimized for I. Figure [4] shows 
that this is the case also when k is iteratively optimized 
starting from the analytical estimate. 



Therefore, we can compute the expensive powers 
(e~ H / 2P — kl) J just once, and obtain any M z _1 by com- 
bining them with the appropriate scalar coefficients. Fur- 
thermore, we often need just the overall contribution to 
the density matrix arising from the tail, which reads 



i=i 



m T (l) 



kl 



j An 



E 

i=i 



ke^') j 



(14) 

If either mj- or P is very large, computing the scalar 
coefficients in (fT4"|) implies a sizable overhead, which is 
however independent of the system size, and becomes 
negligible for large systems. 

In order to assess the accuracy of eq. (fl4"|) . further 
analysis is needed. If we want to reuse the powers 
^ e -H/2P _ , we must keep k fixed to the value op- 
timized at I. Expression (fT3"|) gives the number of terms 
required to compute M,~ with 10~ D accuracy, provided 
that k is optimized for each I. However, the dependence 



B. Newton inversion in the small-/ region 

To address the inversion of the worst-conditioned terms 
with / < /, which are too expensive to obtain by poly- 
nomial expansion, one could resort to one of the tech- 
niques described in our previous wor k 12 i 13 ' i 15 . In fact, 
the analysis performed so far can be seen as an improve- 
ment to those methods, since we can evaluate in one shot 
the contribution from the tail, lowering the number of 
terms which must be treated individually, and therefore 
improving the efficiency. 

In this section we will discuss an alternative ap- 
proach for computing the small-/ M^ 1 , based on a well- 
established Newton method for matrix inversion. We give 
a brief outline of the algorithm and some of its known an- 
alytical properties^, and will use them to estimate the 
number of operations necessary for our purposes. Given 
a non-singular, M x M matrix A, the iterative procedure 



B 



fc+i 



2B A 



BtABi, 



(15) 
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converges to A -1 . Defining R(B) = 1 — BA. the condi- 
tion for convergence is that X = a (R (Bo)) < 1, and the 
error after k iterations is 



^(A-i-Bfc) =a(B ) X (2fc) (l-xr 



(16) 



which corresponds to a number of multiplies (two per 
iteration) 



m N 



2 ln ln[lO-^(l- X )^(A- 1 )/ ( T(B )] 



In 2 



(17) 



needed to achieve a 10~ D relative accuracy. 

One must then face the problem of finding the approx- 
imate inverse Bo needed to start the iterations (|15p . The 
authors of Rcf^ suggested the simple form 



B„ 



AtfllAUJAIU- 



(18) 



where UA^ = max.,- Y,%Li I Aj I and = 

max, Y^,jLi ^ one uses ec l- convergence is 

guaranteed. Taking as usual the large P and Ae limit for 
a metallic system, one obtains m^r ~ In Ae+ln (D In 10)+ 
InM/ir 2 (21 — 1) as an estimate of the operations count 
to invert M g . Even if a feeble M-dcpcndence has been in- 
troduced in the operation count, the efficiency is greatly 
improved if one needs high accuracy or if Ae is large, 
thanks to the exponential convergence rate. 

It is however more effective to exploit the simple an- 
alytic form for M ; _1 to construct better initial guesses. 
For instance, one can use the following relation between 
Mr 1 and M ( -_V 



Mjl 1 ^ = M.l 1 e m6l / p [l + Mj" 1 



^6l/PJ2( e in8t/P_ 1 y M 
3=0 



3 - —0+i) 
I 



(19) 



to estimate a guess for Mi_« starting from an already- 
computed inverse. The series (|19[) converges provided 
that \e mSl / p - l| a (Mf 1 ) < 1. In the P -> oo limit 
this amounts to the condition SI < I — 1/2. In theory, 
all the terms up to I = 1 could be computed inserting 
any M ; _1 into eq. (|19[) . In practice, computing powers 
of M;" 1 is not advisable if we aim at linear scaling, since 
the M ; _1 s and their powers tend to be much fuller than 
the Hamiltonian, and the asymptotic convergence rate of 
cq. (|19p is worse than the one for the iterative inversion. 
In any case, the lowest-order approximation is already 
much more effective than the universal guess described 
in Reffi£. One finds that the convergence ratio for the 
computation of M,~ j , using the low-order extrapolation 

e"/ p M ; " 1 , is x ~ 27r/y / (5e 2 + 7r 2 (2Z- l) 2 , leading to an 
estimate of the number of the operations count 



m N 



In 2 
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FIG. 5: (color online) Total number of matrix-matrix multi- 
plications required to obtain the density matrix, combining 
series expansion and Newton inversion methods, on a log-log 
plot. Light (red) and dark (blue) lines correspond to 10 -5 and 
1CF 8 target accuracy respectively. Full (a), dashed (b) and 
dotted (c) lines correspond respectively to the number of op- 
erations estimated using the general-purpose initial estimate, 
using a zero i,l -order extrapolation guess and using extrap- 
olation together with fast polynomial evaluation in the tail 
region. Grid lines mark the slope expected for a linear depen- 
dence between Ae in units of fcsT and the overall operations 
count rriT- 
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This estimate is independent of Se because we considered 
the worst-case scenario where the system is metallic. It 
is also independent of M and - most importantly - of Ae. 
In practice, one starts from Mj -1 obtained from the poly- 
nomial expansion, then computes Mfj, using e 17r / p M ; rl 
as the initial guess, and continues stepwise, obtaining the 
initial estimate for iterative inversion of M;-_ 2 from the 
previously computed M7 1 , and so on. Alternatively, the 
first inverse matrix can be computed starting from 

the simple guess (|18[) . Efficient higher-order extrapola- 
tions will be discussed in appendix [Bl 

C. Overall operation count 

In the previous section we obtained (equations (|13[) 
and l[17p) an upper bound estimate of the number of 
matrix-matrix multiplications needed in order to obtain 
the tail contribution up to I, and to invert a single M/ 
using an iterative Newton method. The optimal value 
for I is obtained when the incremental cost of including 
an extra term in the tail contribution T; (cfr. cq.[T4|) be- 
comes larger than the cost of a single iterative inversion, 
i.e. when 



(I) - m T (I + l) > m T (I) ■ 
The overall number of multiplications is then 



i-i 



mtot 



m T (I) + ^ m N (0 • 



(21) 



(22) 



i=i 
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In figure [5] we plot the overall operations count ob- 
tained by using our theoretical estimates for my and 
tojv- A dramatic improvement is obtained when we use 
e I7r / p M z _1 as the initial guess for the inversion of M/_i. 
We can think of the extrapolated guess as an almost op- 
timal preconditioner and are considering how this could 
be exploited in different inversion schemes as well. It is 
worth noting that - despite the fact that the tail contri- 
bution requires a number of multiplies scaling quadrati- 
cally with Ae - the overall scaling is significantly sublin- 
car. Comparing our results (figure [5t>) with the multi- 
plication count for standard Chebyshev polynomials ex- 
pansion, as given by RefJ^, our method becomes ben- 
eficial by Ae ~ 20 - the break-even point getting lower 
as the target accuracy D is increased. Fast polynomial 
summation mcthod a 11 ^ 9 ' 20 can be used to compute both 
M^ 1 and Ty. This reduces the number of multiplies 
from rriT to 3y / TOr, however at the cost of storing an 
extra y'mr matrices. Combining these fast summation 
techniques with iterative inversion further lowers the op- 
erations count, leading to a scaling slightly better than 
\fKt (figure [St). In this case, however, the prefactor of 
our method is larger, so that the break-even point, when 
comparing with Ref i n i 19 i 20 , is shifted towards higher ac- 
curacy and large Ae. We are currently investigating the 
possibility of applying an alternative expansion of the tail 
contribution, which should make both our Ae scaling and 
the prefactor highly competitive. 



IV. A TEST CASE 



So far we have estimated the accuracy of the 



computation of each M ; 1 



M7 



(Mr 1 ) 



term using e 
as a measure of the error 



Mr 1 



H 1 

affecting the estimate M ; ~ x . However, the quantity 
we are more interested in is the band structure energy 
E = Tr [pH] . A theoretical estimation of the error on E 
requires several assumptions on the distribution of errors 
over the different eigenvalues of the Hamiltonian, and the 
different I terms, and we have not attempted it here. We 
have instead tested our method against a real system, 
selecting the self-consistent DFT Hamiltonian matrix of 
a 128-atom sample of the metallic fee phase of LiAl, as 
computed by the CP2K— & package^ 9 -. The orthogo- 
nal Hamiltonian matrix is obtained by multiplying the 
non-orthogonal one with the inverse square root of the 
overlap matrix^. We the computed with standard di- 
agonalization techniques the chemical potential and the 
exact band-structure energy for different electronic tem- 
peratures. We also obtained the bounds of the spectrum 
of H (e + = 121.15 eV and e_ = -42.65 eV), which are 
needed in eq. (|A1|) and could in principle be computed 
in linear scaling with the Lanczos method, or easily es- 
timated by Gcrshgorin's circle theorem^ or any matrix 
norm. 

We then applied our algorithm to the orthogonalizcd 



FIG. 6: (color online) Number of matrix-matrix multiplica- 
tions used to achieve a given error on the band structure 
energy, for different electronic temperatures. Details of the 
system are given in the text. The data points for every tem- 
perature, from left to right, correspond to 10~ 2 , 10~ 3 , 1CF 5 , 
and 1CP 7 target accuracy. 
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FIG. 7: (color online) The number of matrix-matrix multi- 
plies performed to compute the density matrix for the LiAl 
test case, for different electronic temperatures and target ac- 
curacies, plotted on a log — log scale, together with guidelines 
corresponding to a m oc V Ae scaling. 
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Hamiltonian, using fast polynomial summation to com- 
pute the tail and using first-order extrapolation in the 
Newton region, with a history vector containing the last 
two matrices (cfr. eq. (|B1I0 . Slight improvements in the 
operations count could be obtained by hand-tuning I, but 
we just used the automatic procedure based on our the- 
oretical estimates, as described in the previous section. 
In Figure [5] we plot the number of multiplications per- 
formed versus the resulting error on the energy. Since 
we can use a large value of P, e~ H / 2P can be computed 
with only a few matrix-matrix multiplies, which have not 
been included in the operations count. 

For a given target accuracy, the operations count scales 
better than ^/]3 (Figure [7]). We also observe that the ac- 
curacy of the energy is much better than the relative ac- 
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curacy guaranteed by the theoretical estimates. Consider 
for example that, by requiring a relative "spectral radius 
accuracy" better than 10~ 2 (first data points in Figurc[6]) 
we obtain a relative error on the energy of the order of 
1CP 4 (the total energy is ~ —5 keV). This is mainly due 
to the fact that the error in the energy is second order 
with respect to the error in the density matrix. However, 
we observe that also the error in the full density matrix, 
computed as the spectral radius of the difference with 
the result obtained with diagonalization, is in general al- 
most one order of magnitude smaller than the required 
accuracy. This result is probably due to a combination 
of effects: firstly, we use worst-case estimates, so that 
the accuracy of the individual terms is necessarily higher 
than the assumed one. Moreover, the errors affecting dif- 
ferent / terms might partially cancel each other out, and 
many of the contributions in the Newton region are com- 
puted with an accuracy much higher than requested, due 
to the exponential convergence. The accuracy improves 
very quickly as the number of operations increases un- 
til, for errors around 0.01 meV/atom, numerical issues 
come into play and prevent further refinement, which is 
anyway hardly necessary for most applications. 

Most of the observables relevant to electronic struc- 
ture calculations, such as forces and electronic density, 
are readily evaluated by expressions of the form (A) — 
Tr [pA] . Since the matrix A obeys the same sparsity 
as the Hamiltonian (A) depends only on a small subset 
of the nonzero elements of the density matrix. We are 
currently investigating whether it is possible to compute 
the expectation value directly, without evaluationg non- 
relevant elements of p, which would further improve the 
efficiency. 



V. CONCLUSION 

We have performed a detailed study of a recently- 
proposed form for Fermi operator expansion. The prop- 
erties of this expansion allow features of the expansion in 
polynomial and rational functions to be combined, and 
by optimizing the mixture we can have the best of both 
worlds. In this way, we circumvent the tradeoff between 
the number of terms and the accuracy of the expansion, 
which was needed by prior implementations of this ex- 
pansion of the Fermi operator. Moreover, sub-linear scal- 
ing of the matrix-matrix multiplications count with re- 
spect to the Hamiltonian range is achieved, making the 
method particularly attractive for low-temperature and 
high-accuracy applications. However, there is still room 
for improvement. In particular, work is in progress in 
the direction of a better polynomial expansion in the tail 
region. We are also considering applying the method 
to molecular dynamics. In this case one could use the 
M ; _1 s stored from the previous step as a starting point 
for iterative minimization. In this way, the computa- 
tion of the different /-channels can be made indepen- 
dent, adding a layer of parallelism on top of the parallel 



matrix-matrix multiply. Formal analogies between our 
expansion and Trotter factorization entering path inte- 
gral techniques suggest that some of the ideas presented 
here might be useful to tackle that problem as well. In 
order to achieve linear scaling, attention should be paid 
to the issue of matrix truncation, since here we have 
dealt only with matrix-matrix operations counts. Pre- 
liminary results show that in this respect there are no 
significant differences from standard expansion methods, 
as the minimum sparsity of the terms taken into account 
is basically the same as the sparsity of the whole density 
matrix, which is dictated by the physics of the system. 
The detailed analysis we have performed in this work has 
allowed us to obtain significant improvements over the 
previous applications of this decomposition of the Fermi 
operator, and lays solid foundations for further progress. 
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APPENDIX A: OPTIMAL PARAMETER FOR 
SERIES EXPANSION 

We show how the value of k in eq. PH]) can be op- 
timized to obtain faster convergence of the polynomial 
expansion. Expressions involved are quite lengthy, so 
we introduce several shorthands. Let e± be the bounds 
of the Hamiltonian spectrum. We parametrize k as 
k = (l + e w r/P), define e ^(2i-i)/2P = ( Vl + i Wt ) and 

s± = e~ c± / 2P . The square modulus of the extrema of 
the transformed Hamiltonian spectrum (see Figure [3]) is 

rf2 _ r 2 + P 2 (s± - if - 2Pr (s± - 1) cos 6 

± ~ r 2 + 2P 2 (1 - vi) + 2Pr (cosd (1 - v{) + wi sinfl) 

(Al) 

and the convergence ratio is x = max (d+, g?_). One can 
obtain an analytical estimate for k, and an upper bound 
for by taking the P — > oo limit, and making the sim- 
plifying assumption |e±| = Ae. This implies 8 = — it/ 2 
and leads to the estimate (|lll) . which can be further im- 
proved by minimizing numerically (|A1[) with respect to 
8 and r. 



APPENDIX B: HIGH-ORDER INITIAL GUESS 
FOR ITERATIVE INVERSION 

One can derive expressions for high-order extrapola- 
tion of inverse M/ matrices from equation (|19p . writing 
them as a linear combination of already-computed in- 
verses. We will sketch the procedure by deriving the 
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expression for the first-order extrapolation of M ; i , us- 
ing only M ; -1 and Mji, which is then easily extended 

to higher orders. Let cf ] = e -" i7r / p (e~ ni7T / p - l) J ~\ 
One can write the first-order extrapolations for the new 
inverse and for the already-computed one, as a function 
of powers of Mr" : 



This linear system can be solved for M, ^ and M ( 2 , 
obtaining 

M,- 1 ! ~ Ml 1 (e™' p + e 2i "/ p ) - Mj^e 31 ^. (Bl) 

For higher orders one simply inserts into the system more 
constraints, corresponding to "older" inverse matrices, 
and writes the extrapolation including higher powers of 
M ; _1 . The system is then solved in terms of these powers, 
eventually finding the coefficients for the estimate of the 
new inverse as a linear combination of the older ones. 
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